Correcting transcribed audio files with an email-client interface

ABSTRACT

Methods and systems for requesting a transcription of audio data. One method includes displaying a send-for-transcription button within an email-client interface on a computer-controlled display, and automatically sending a selected email message and associated audio data to a transcription server as a request for a transcription of the associated audio data when a user selects the send-for-transcription button.

RELATED APPLICATIONS

The present application is a continuation-in-part of InternationalApplication PCT/US2007/066791 filed on Apr. 17, 2007, which claimspriority to U.S. Provisional Application 60/792,640 filed on Apr. 17,2006, the entire contents of which are both hereby incorporated byreference. The present application also claims priority to U.S.Provisional Application 60/992,187 filed on Dec. 4, 2007; U.S.Provisional Application 61/005,456 filed on Dec. 4, 2007; and U.S.Provisional Application 61/076,054 filed on Jun. 26, 2008, the entirecontents of which are all hereby incorporated by reference.

BACKGROUND OF THE INVENTION

Each day individuals and companies receive multiple audio messages.These audio messages can include personal greetings and information orbusiness-related instructions and information. In either case, it may beuseful or required that the audio messages be transcribed in order tocreate written records of the messages.

Software currently exists that generates written text based on audiodata. For example, Nuance Communications, Inc. provides a number ofsoftware programs, trademarked “Dragon,” that take audio files in .WAVformat, .MP3 format, or other audio formats and translate such filesinto text files. The Dragon software also provides mechanisms forcomparing audio files to text files in order to “learn” and improvefuture transcriptions. The “learning” mechanism included in the Dragonsoftware, however, is only intended to learn based on a voice-dependentmodel, which means that the same person trains the software program overtime. In addition, learning mechanisms in existing transcriptionsoftware are often non-continuous and include set training parametersthat limit the amount of training that is performed.

SUMMARY OF THE INVENTION

Embodiments of the present invention provide methods and systems forcorrecting transcribed text. One method includes a user sending one ormore emails to a transcription server that include audio data via anemail-client interface. The emails may be sent from one or more datasources running email-clients and include audio data to be transcribed.The audio data is transcribed based on a voice model to generate textdata. The method also includes making the text data available to theuser over at least one computer network and receiving corrected textdata over the at least one computer network from the user. In addition,the method includes modifying the voice model based on the correctedtext data.

Embodiments of the present invention also provide systems for correctingtranscribed text. One system includes a transcription server, at leastone translation server, an email-client correction interface, and atleast one training server. The transcription server receives audio datafrom one or more audio data sources and the translation server cantranscribe the audio data based on a voice model to generate text data.The email-client correction interface is accessible by a user fromwithin an email-client and provides the user with access to the textdata. The transcription server also receives corrected text data fromthe plurality of users. The training server then modifies the voicemodel based on the corrected text data.

Additional embodiments of the invention also provide methods ofperforming audio data transcription. One method includes obtaining audiodata from at least one audio data source, such as a voice over IP systemor a voicemail system, transcribing the audio data based on avoice-independent model to generate text data, and sending the text datato an owner of the audio data as an email message.

Embodiments of the invention also provide a method of requesting atranscription of audio data. The method includes displaying asend-for-transcription button within an email-client interface on acomputer-controlled display, and automatically sending a selected emailmessage and associated audio data to a transcription server as a requestfor a transcription of the associated audio data when a user selects thesend-for-transcription button.

Further embodiments of the invention provide a system for requesting atranscription of audio data. The system includes a transcription serverand an email-client interface. The email-client interface displays atleast one email message associated with audio data to a user, displays asend-for-transcription button to the user, receives a selection of theat least one email message from the user, receives a selection of thesend-for-transcription button from the user, and automatically sends theat least one email message and associated audio data to thetranscription server as a request for a transcription of the associatedaudio data in response to the user's selection of thesend-for-transcription button.

Additional embodiments of the invention also provide a system forgenerating a transcription of audio data. The system includes atranscription server and a translation server. The transcription serveris configured to receive at least one email message and associated audiodata from an email-client, identify an account based on the at least oneemail message, and obtain stored account settings associated with theidentified account. The translation server is configured to generate atranscription of the associated audio data based on the account settingsand a voice-independent model.

BRIEF DESCRIPTION OF THE DRAWINGS

In the drawings:

FIGS. 1 and 2 schematically illustrate systems for transcribing audiodata according to various embodiments of the invention.

FIG. 3 illustrates an email-client interface according to an embodimentof the invention.

FIG. 4 illustrates a process for transcribing audio data using theemail-client interface according to an embodiment of the invention.

FIG. 5 illustrates the transcription server of FIGS. 1 and 2 accordingto an embodiment of the invention.

FIG. 6 illustrates a file transcription, correction, and training methodaccording to an embodiment of the invention.

FIG. 7 illustrates another file transcription, correction, and trainingmethod according to an embodiment of the invention.

FIG. 8 illustrates a correction method according to an embodiment of theinvention.

FIGS. 9-10 illustrate a correction notification according to anembodiment of the invention.

FIGS. 11-14 illustrate an email-client correction interface according toan embodiment of the invention.

FIG. 15 illustrates a message notification according to an embodiment ofthe invention.

DETAILED DESCRIPTION

Before any embodiments of the invention are explained in detail, it isto be understood that the invention is not limited in its application tothe details of construction and the arrangement of components set forthin the following description or illustrated in the following drawings.The invention is capable of other embodiments and of being practiced orof being carried out in various ways.

In addition, it should be understood that embodiments of the inventioninclude hardware, software, and electronic components or modules that,for purposes of discussion, may be illustrated and described as if themajority of the components were implemented solely in hardware. However,based on a reading of this detailed description, one of ordinary skillin the art would recognize that, in at least one embodiment, theelectronic based aspects of the invention may be implemented insoftware. As such, it should be noted that a plurality of hardware andsoftware based devices, as well as a plurality of different structuralcomponents, may be utilized to implement the invention. Furthermore, andas described in subsequent paragraphs, the specific configurationsillustrated in the drawings are intended to exemplify embodiments of theinvention. Other alternative configurations are possible.

FIG. 1 illustrates a transcription system 10 for transcribing audio dataaccording to an embodiment of the invention. As shown in FIG. 1, thesystem 10 includes a transcription server 20, a data source running anemail-client 30, and a third party device 40. The transcription server20 includes, among other things, a voice file directory 52, a queueserver 54, and a translation server 56. The transcription server isdescribed in more detail below. The data source email-client 30 and thethird party device 40 can be connected to the transcription server 20via a wide area network 50 such as a cellular network or the Internet.

Information flow through the system 10 begins in the data sourceemail-client 30. The data source email-client 30 can include astand-alone email-client, such as Outlook manufactured by Microsoft™ orLotus Notes manufactured by IBM™. In other embodiments, the data sourceemail-client 30 can include a browser-based email-client, such asHotmail, Gmail, Yahoo, AOL, etc. As described below, in addition toproviding standard emailing operations, the data source email-client 30can provide one or more email-client interfaces (e.g., via one or moreplug-ins or additional software modules installed and used as part ofthe email-client 30) that allow a user to request, view, manage, andcorrect transcribed text data.

A user sends information from the data source email-client 30 throughthe wide area network 50 (e.g. a cellular network, the Internet, etc.)to the transcription server 20. The transcription server 20 places theinformation in the voice file directory 52 related to an account for theuser that sent the information. The information to be transcribed isplaced in the queue server 54 before being routed to the translationserver 56 to be transcribed. After the information has been transcribed,it is sent back through the wide area network 50 and may, optionally, besent to a third party device 40 for correction. In some embodiments, ifthe information is not sent to a third party device 40 for correction orif the third party device 40 has finished correcting the transcription,the information is sent back to the data source email-client 30.

FIG. 2 illustrates an exemplary embodiment of the network 10 fromFIG. 1. The transcription server 20 can include or can be connected toan email server 20 a that receives email messages from a client computer30 a or other devices running email-clients, such as a personal digitalassistant (“PDA”) 30 b, a Blackberry device 30 c, or a mobile phone 30d. In other embodiments, additional devices that support email-clientsmay also by used. The system 10 also includes a third party device 40.The third party device 40 can receive messages including transcribedtext to be corrected or checked before the text is sent back to theuser. As described below, in some embodiments, the third party device 40provides one or more email-client interfaces for viewing and correctingtranscribed text.

FIG. 3 illustrates an embodiment of an email-client interface 60. Theemail-client interface 60 allows a user to interact with thetranscription server 20 from FIGS. 1 and 2. In some embodiments, theemail-client interface 60 is provided through an email-client, such asthe data source email-client 30. The email-client can include astand-alone email-client, such as Outlook manufactured by Microsoft™ orLotus Notes manufactured by IBM™. In other embodiments, the email-clientcan include a browser-based email-client, such as Hotmail, Gmail, Yahoo,AOL, etc. In some embodiments, the email-client interface 60 is providedby a plug-in or additional software module that is installed and usedwith the email-client, which allows a user to access and managetranscribed text from within a standard email-client and without havingto launch and access a separate interface for managing transcribed text.

As shown in FIG. 3, the email-client interface 60 includes a send button62, a quick play button 64, a search field 66, and an options button 68.The send button 60 allows the user to send one or more selected emailmessages that include audio data to the transcription server 20. Thesearch field 66 allows a user to search messages that have already beensent to the transcription server 20. As a result, the search field 66allows a user to access information within the transcription system 10without having to access a web interface. The quick play button 64allows the user to play audio data related to a message that has alreadybeen sent to the transcription server 20. The options button 68 allows auser to modify features related to the email-client interface 60 and anemail-client correction interface described below. In some embodiments,the options button 68 allows a user to modify account settings relatedto delivery settings, transcription settings, format settings, and thelike. In other embodiments, the email-client interface 60 includesadditional buttons and functionality.

In conjunction with the email-client interface 60, the email-clientcorrection interface is also accessed from within an email-client, suchas the data source email-client 30 or an email-client executed by thethird party device 40. In some embodiments, the email-client correctioninterface is also is provided by a plug-in or additional software modulethat is installed and used with the email-client. The email-clientcorrection interface can be part of the same plug-in providing theemail-client interface 60.

The email-client correction interface allows a user to access aweb-based correction interface from within an email-client, eliminatingthe need to launch a separate web browsing application or interface.Aspects of the email-client correction interface include, among otherthings, the ability to view and correct transcriptions of audio data,monitor the transcription status of audio data sent to the transcriptionserver, and modify account settings. The email-client correctioninterface is described in greater detail below with respect to FIGS.11-14.

FIG. 4 illustrates a process 70 for using the email-client interface 60to send messages including audio data through the transcription system10. The user selects one or more email messages including audio data tobe transcribed (step 72). In some embodiments, the selected emailmessage include attached audio data representing voice mail messages.Selecting the email messages may include highlighting the messages,opening individual messages, or any other acceptable selectiontechniques. After step 72, the user selects the send button 62 from theemail-client interface 60 to forward the selected email messages to thetranscription server 20 (step 74). Additionally or alternatively, theuser can reply to a message from the transcription server 20, makechanges or corrections to the transcribed text, and send the messageback to the transcription server 20, as described below.

When the messages arrive at the transcription server 20, identifyinginformation is taken from the email messages to identify a user account(step 76). In some embodiments the identifying information is metadatataken from the email message. The metadata may include, among otherthings, information such as a sender's email address and IP address. Inother embodiments, identifying information is included in the body ofthe email message and extracted to identify a user account. After theaccount is identified, the message is sent to a voice file directory 52related to that account (step 78). Account settings, such as, forexample, destination information and formatting information, may bemodified for each account. The account settings can be modified oraccessed through a system interface, such as the email-client correctioninterface.

The messages stored in the voice file directory 52 awaitingtranscription are polled into a queue server 54 (step 80). The queueserver 54 holds the messages until a translation server 56 becomesavailable. When a translation server 56 becomes available, the queueserver 54 routes the messages to the available translation server 56(step 82). The messages enter the translation server 56 and the audiodata associated with the message is transcribed (step 84). As describedbelow, the transcription server can also receive messages with correctedtranscribed text. If the transcription server 20 receives a messageincluding corrected transcribed text, the transcription server 20compares the original transcribed text with the user-correctedtranscribed text. After the transcription server 20 has compared theoriginal and the user-corrected text, a message including theuser-corrected text or the differences between the original text and theuser-corrected text is sent to a training queue to update the voicemodel, as described below.

After the audio data has been transcribed, the transcribed text may besent to a third party for correction or may be sent directly to one ormore destinations specified in the user's account settings (step 86). Asdescribed above, the transcribed text can be sent to a destination in anemail message (e.g., embedded or as an attached file). In someembodiments, if the transcribed text is not sent to a third party, it issent directly to the training queue to update the voice model (step 90).If the transcribed text is sent to a third party for correction, thethird party will correct the transcription using, for example, theemail-client correction interface described below (step 88). After step88, the transcribed and/or corrected text is sent to the training queueto update the voice model (step 90). The transcribed text is then sentback to the user (step 92). A more detailed description of thetranscription server 20 is provided below.

As shown in FIG. 5, the transcription server 20 receives audio data 100from one or more of the audio data sources 30. In some embodiments, asnoted above, the transcription server 20 includes or is connected to oneor more intermediary servers, such as an email server 20 a, that receivemessages from the audio data sources 30. Additional intermediary serversmay be present such as a voice over IP (“VoIP”) server 20 c and a shortmessage service (“SMS”) server 20 b to receive audio data fromadditional sources. The messages can be received continuously or inbatch form, and can be sent to the transcription server 20 and/or pulledby the transcription server 20 in any manner (e.g., continuously, inbatch form, and the like). For example, in some embodiments, thetranscription server 20 is adapted to request messages at regularintervals and/or to be responsive to a user command or to some otherevent. In some embodiments, rather than immediately transmitting theconverted message(s) to the transcription server 20, the audio datasources 30 and/or any intermediary servers store the convertedmessage(s) until requested by the transcription server 20 or a separatepolling computer. By requesting messages from the audio data sources 30and/or any intermediary servers, the transcription server 20 or theseparate polling computer can manage the messages. For example, in oneimplementation, the transcription server 20 or a separate pollingcomputer establishes a priority for received messages to be transcribed.The transcription server 20 or a separate polling computer alsodetermines a source of a received message (e.g., the audio data source30 that transmitted the message). For example, the transcription server20 or separate polling computer can use metadata taken from the emailcontaining audio data to identify the source of a particular message. Inadditional embodiments, other types of identifying data can be used toidentify the source of a received message.

Once the transcription server 20 or separate polling computer receivesone or more messages (received by request or otherwise), thetranscription server 20 or separate polling computer places the messagesand/or the associated audio data to be transcribed into one or morequeue servers 54. The queue servers 54 look for an open or availableprocessor or translation server 56. As shown in FIG. 5, thetranscription server 20 includes multiple translation servers 56,although a different number of translation servers 56 (e.g., physical orvirtual) are possible. Upon identifying an available translation server56, the queue servers 54 route audio data to the available translationserver 56. The translation server 56 transcribes the audio data togenerate text data and, in some embodiments, indexes the message. Thetranslation servers 56 index the messages using a database to identifydiscrete words. For example, the translation server 56 can use anextensible markup language (“XML”), structured query language (“SQL”),mySQL, idx, or other database language to identify discrete words orphrases within the transcribed text.

In addition to transcribing audio data included in messages as justdescribed, some embodiments of a translation server 56 generate an indexof keywords based upon the transcribed text. For example, in someembodiments, the translation server 56 removes those words that are lesscommonly searched and/or less useful for searching (e.g., I, the, a, an,but, and the like) from transcribed text, which leaves a number ofkeywords that can be stored in memory available to the translationservers 56. The resulting “keyword index” includes the exact positionsof each keyword in the transcribed text, and, in some cases, includesthe exact location of each keyword in the corresponding audio data. Thiskeyword index enables users to perform searches on transcribed text. Forexample, a user accessing the transcribed text associated withparticular audio data (whether for purposes of correcting any errors inthe transcribed text or for searching within the transcribed text) canselect one or more words from the keyword index of the message generatedearlier. In so doing, the exact locations (e.g., page and/or linenumbers) of such words can be provided quickly and efficiently—in manycases significantly faster and with less processing power thanperforming a standard search for the word through the entire transcribedtext. The system 10 can provide the keyword index to a user in anysuitable manner, such as in a pop-up or pull-down menu included in aninterface of the system 10, such as the email-client correctioninterface, during text correction or searching of transcribed text(described below).

Also, in some embodiments, a translation server 56 generates two or morepossible candidates for a transcription of a spoken word or phrase fromaudio data. The most likely candidate is displayed or otherwise used togenerate the transcribed text, and the less likely candidate(s) aresaved in a memory accessible by the translation server 56 and/or byanother server or third party device 40 as needed. This capability canbe useful, for example, during correction of the transcribed text(described below). In particular, if a word in the transcribed text iswrong, a user can obtain other candidate(s) identified by thetranslation server 56 during transcription, which can speed up and/orsimplify the correction process.

Once audio data is transcribed, the system 10 can allow a user to searchtranscribed text for particular words and/or phrases. This searchingcapability can be used during correction of transcribed text asdescribed below or when a transcribed text file is searched forparticular words (whether a search for such words is performed on thefile alone or in combination with one or more other files). For example,using the indexed message, a user viewing generated text data can selecta word or phrase included in the text data and, in some embodiments, canhear the corresponding portion of the audio data from which the textdata was generated. In some embodiments, the system 10 is adapted toenable a user to search some or all transcribed text files accessible bythe transcription server 20, regardless of whether such files have beencorrected. Also, the system 10 can enable a user to search transcribedtext using Boolean and/or other search terms.

Search results can be generated in a number of manners, such as in atable form enabling a user to select one or more files in which a wordor phrase has been found and/or one or more locations at which a word orphrase has been found in particular text data. The search results canalso be sorted in one or more manners according to one or more rules(e.g., date, relevance, number of instances in which the word or phrasehas been found in text data, and the like) and can be printed,displayed, or exported as desired. In some embodiments, the searchresults also provide the text around the found word or phrase. Thesearch results can also include additional information, such as thenumber of instances in which a word or phrase has been found in atranscribed text file and/or the number of transcribed text files inwhich a word or phrase has been found.

After the translation servers 56 index and translate audio data, theaudio data and/or the generated text data is stored. The audio data andtext data can be stored internally by the transcription server 20 or canbe stored externally to one or more data storage devices (e.g.,databases, servers, and the like). In some embodiments, a user (e.g., auser associated with a particular audio data source email-client 30)decides how long audio data and/or text data is stored by thetranscription server 20, after which time the audio data and/or textdata can be automatically deleted, over-written, or stored in anotherstorage device (e.g., a relatively low-accessibility mass storagedevice). An interface of the system 10 (e.g., the email-clientcorrection interface) enables a user to specify a time limit for audiodata and/or text data stored by the transcription server 20.

As shown in FIGS. 1 and 2, a data source email-client 30 connects to thetranscription server 20 over a network, such as the Internet, one ormore local or wide-area networks 50, or the like, in order to obtainaudio data and/or corresponding, generated text data. A user uses thedata source email-client 30 to access the email-client correctioninterface associated with transcription server 20 to obtain generatedtext data and/or corresponding audio data. For example, using theemail-client interface correction, the user can request particular audiodata and/or the corresponding text data. The requested data is obtainedfrom the transcription server 20 and/or a separate data storage deviceand is transmitted to the user for display via the interface.

The transcription server 20 sends audio data and/or correspondinggenerated text data to the user as an email message. The transcriptionserver 20 can send an email message to a user that includes the audiodata and the text data as attached files. In other embodiments, thetranscription server 20 sends an email message to a user that includes anotification that audio data and/or text data is available for the user.A user uses the email-client correction interface in order to listen tothe audio data, view the text data, and/or to correct the text data. Asdescribed above, in some embodiments, a user can reply to the emailmessage sent from the transcription server 20, correct thetranscription, and send the corrected transcription back to thetranscription server 20. The transcription server then updates the voicemodel based on a comparison of the original transcribed text and theuser-corrected transcribed text. If the user replies directly to thetranscription server, the user does not need to access the email-clientcorrection interface, web interface, or other interfaces of the system10.

In other embodiments, the user can choose to correct only parts oftranscribed text. If the user corrects only a portion of the transcribedtext, the email-client (e.g., the email-client correction interface)recognizes that only a portion of the text has changed and transmitsonly the corrected portion of the text to the transcription server 20for use in training the voice model. By submitting only the corrected orchanged portion of the transcribed text, the amount of data transmittedto the transcription server 20 for processing is reduced. In otherembodiments, another email-client interface, a web-based interface, thetranscription server 20, or another device included in the system 10 candetermine what portions of transcribed text have been changed and canlimit transmission and/or processing of the changed text accordingly.

If a user forwards or sends an email message to the transcription server20 that includes audio data, the transcription server 20 can send areturn email message to the user after the transcription server 20transcribes the submitted audio file. The email message can inform theuser that the submitted audio data was transcribed and thatcorresponding text data is available. As previously noted, the emailmessage from the transcription server 20 can include the submitted audiodata and/or the generated text data.

The system 10 can also enable a user to provide destination settings foraudio data and/or text data on a per-generated-text-data basis. In someembodiments, before or after audio data is transcribed, a user specifiesa particular destination for the text data. As described above, certainimplementations allow a user to specify destination settings in an emailmessage. For example, if the user sends an email message to thetranscription server 20 that includes audio data, the user can specifydestination information in the email message. After the audio message istranscribed and the generated text data is corrected (if applicable),the transcription server 20 sends an email message to the identifiedrecipient (e.g., via a SMTP server).

In some embodiments, to protect the privacy and security of the audioand text data, the transcription server 20 transmits data (e.g., audiodata and/or text data) to the third party device 40 or anotherdestination device using file transfer protocol (“FTP”). The transmitteddata can also be protected by a secure socket layer (“SSL”) mechanism(e.g., a bank level certificate).

In one embodiment, system 10 includes an email-client correctioninterface and a streaming translation server 102 that a user accesses(e.g., via the data source email-client 30) to view generated text. Asdescribed below with respect to FIG. 11, in some embodiments, theemail-client correction interface and the streaming translation server102 also enable a user to stream the entire audio data corresponding tothe generated text data and/or to stream any desired portion of theaudio data corresponding to selected text data. For example, theemail-client correction interface and the streaming translation server102 enable a user to select (e.g., click-on, highlight, mouse over,etc.) a portion of the text in order to hear the corresponding audiodata. In addition, in some embodiments, the email-client correctioninterface and the streaming translation server 102 enable a user tospecify a number of seconds that the user desires to hear before and/orafter a selected portion of text data.

The email-client correction interface also enables a user to correctgenerated text data. For example, if a user listens to audio data anddetermines that a portion of the corresponding generated text data isincorrect, the user can correct the generated text data via theemail-client correction interface. In some embodiments, the email-clientcorrection interface automatically identifies potentially incorrectportions of generated text data by displaying potentially incorrectportions of the generated text data in a particular color or otherformat (e.g., via a different font, highlighting in bold, italics,underline, or any other manner). The email-client correction interfacealso displays portions of the generated text in various colors or otherformats depending on the confidence that the portion of the generatedtext is correct. The email-client correction interface also inserts aplaceholder (e.g., an image, an icon, etc.) into text that marksportions of the generated text where text is missing (i.e., thetranscription server 20 could not generate text based on the audiodata). A user selects the placeholder in order to hear the audio datacorresponding to the missing text and can insert the missing textaccordingly.

In order to assist a user in correcting generated text data, someembodiments of the email-client correction interface automaticallygenerate words similar to incorrectly-generated words. In this regard, auser selects a word (e.g., by highlighting, clicking, or by any othersuitable manner) within generated text data that is or appears to beincorrect. Upon such selection, the email-client correction interfacesuggests similar words, such as in a pop-up menu, pull-down menu, or inany other format. The user selects a word or words from the list ofsuggested words in order to make a desired correction.

In some embodiments, the translation server(s) 56 are configured toautomatically determine speakers in an audio file. For example, thetranslation server 56 processes audio files for drastic changes in voiceor audio patterns. The translation server 56 then analyzes the patternsin order to identify the number of individuals or sources speaking in anaudio file. In other embodiments, a user or information associated withthe audio file (e.g., information included in the email messagecontaining the audio data, or stored in a separate text file associatedwith the audio data) identifies the number of speakers in an audio filebefore the audio file is transcribed. For example, a user uses aninterface of the system 10 (e.g., the email-client correction interface)to specify the number of speakers in an audio file before or after theaudio file is transcribed.

After identifying the number of speakers in an audio file, thetranslation server(s) 56 can generate a speaker list that marks thenumber of speakers and/or the times in the audio file where each speakerspeaks. The translation server(s) 56 can use the speaker list whencreating or formatting the corresponding text data to provide markers oridentifiers of the speakers (e.g., Speaker 1, Speaker 2, etc.) withinthe generated text data. In some embodiments, a user can update thespeaker list in order to change the number of speakers included in anaudio file, change the identifier of the speakers (e.g., to the names ofthe speakers), and/or specify that two or more speakers identified bythe translation server(s) 56 relate to a single speaker or audio source.Also, in some embodiments, a user can use an interface of the system 10(e.g., the email-client correction interface) to modify the speaker listor to upload a new speaker list. For example, a user can change theidentifiers of the speakers by updating a field of the email-clientcorrection interface that identifies a particular speaker. For example,each speaker identifier displayed within generated text data can beplaced in a user-editable field. In some embodiments, changing anidentifier of a speaker in one field automatically changes theidentifier for the speaker throughout the generated text data.

In some embodiments, the system 10 also formats transcribed text databased on one or more templates, such as templates adapted for particularusers or businesses (e.g., medical, legal, engineering, or otherfields). For example, after generating text data, the system 10 (e.g.,the translation server(s) 56) compares the text data with one or moretemplates. If the format or structure of the text data corresponds tothe format or structure of a template and/or if the text data includesone or more keywords associated with a template, the system 10 formatsthe text data based on the template. For example, if the system 10includes a template specifying the following format:

Date:

Type of Illness:

and text data generated by the system 10 is “the date today is Septemberthe 12^(th), the year is 2007, the illness is flu,” the system 10automatically applies the template to the text data in order to createthe following formatted text data:

Date: Sep. 12, 2007

Type of Illness: Flu

In some embodiments, the system 10 is configured to automatically applya template to text data if text data corresponds to the template.Therefore, as the system 10 “learns” and improves its transcriptionquality, as described below, the system 10 also “learns” and improvesits application of templates. In other embodiments, a user uses aninterface of the system 10 (e.g., the email-client correction interface)to manually specify a template to be applied to text data. For example,a user can select a template to apply to text data from a drop down menuor other selection mechanism included in the interface.

The system 10 can store the formatted text data and can make theformatted text data available for review and correction, as describedbelow. In some embodiments, the system 10 stores or retains theunformatted text data separately from the formatted text data. Byretaining the unformatted text data, the text data can be applied to newor different templates. In addition, the system 10 can use theunformatted text data to train the system 10, as described below.

The system 10 is configured to allow a user to create a customizedtemplate and upload the template to the system. For example, a user usesa word processing application, such as Microsoft® Word®, to create atext file that defines the format and structure of a customizedtemplate. The user then uploads the text file to the system 10 using aninterface of the system 10 (e.g., the email-client interface 60 and/orthe email-client correction interface). In some embodiments, the system10 reformats uploaded templates. For example, the system 10 can storepredefined templates and/or customized templates in a mark-up language,such as XML or HTML.

Templates can be associated with a particular user or a group of users.For example, only users with certain permission may be allowed to use orapply particular templates. In other embodiments, a user can upload oneor more templates that only he or she can use or apply. Settings andrestrictions for predefined and/or customized templates can beconfigured by a user or an administrator using an interface of thesystem 10.

In some embodiments, alternatively or in addition to configuringtemplates, the system 10 enables a user to configure one or morecommands that replace transcribed text with different text. For example,a user configures the system 10 to insert the current date into textdata whenever audio data and/or corresponding text data includes theword “date” or the phrases “today's date,” “current date,” or “inserttoday's date.” Similarly, in another embodiment, system 10 is configuredto start a new paragraph within transcribed text data each time audiodata and/or corresponding text data includes the word “paragraph,” thephrase “new paragraph,” or a similar identifier. The commands can bedefined on a per user basis and/or on a group of users basis, andsettings or restrictions for the commands can be set by a user or anadministrator using the system 10.

Some embodiments of the system 10 also enable a user correcting textdata via the email-client correction interface to create commands and/orkeyboard shortcuts. In one example, the system is configured so that auser can use the commands and/or keyboard shortcuts to stream audiodata, add common words or phrases to text data, play audio data, pauseaudio data, or start or select objects or functions provided through theemail-client correction interface or other interfaces of the system 10.In some embodiments, a user uses the email-client correction interfaceto configure the commands and/or keyboard shortcuts. The commands and/orkeyboard shortcuts can be stored on a user level and/or a group level.An administrator can also configure commands and/or keyboard shortcutsthat can be made available to one user or multiple users. For example,users with particular permissions may be allowed to use particularcommands and/or keyboard shortcuts.

In one embodiment, the email-client correction interface reacts tocommands spoken by the user. In another embodiment, the system 10 isconfigured to permit a user to create commands that when spoken by theuser cause the email-client correction interface to perform certainactions. In some embodiments, the user can say “play,” “pause,”“forward,” “backward,” etc. to control the playing of the audio data bythe email-client correction interface. Other commands include insert,delete, or edit text in transcribed text data. For example, when usersays “date,” the email-client correction interface inserts dateinformation into transcribed text data.

In some embodiments, the system 10 also performs translations oftranscribed text data. For example, the email-client correctioninterface or another interface of the system 10 includes features topermit a user to request a translation of transcribed text data intoanother language. The transcription server 20 includes one or morelanguage translation modules configured to create text data in aparticular language based on generated text data in another language.The system is also configured to process an audio source (e.g., anindividual submitting an email message with an attached audio file tothe transcription server 20) with a request to translate the file to aspecific language when an audio file is submitted to the transcriptionserver 20.

With continued reference to the illustrated embodiment of FIG. 5,corrections made by a user through the email-client correction interfaceare transmitted to the transcription server 20. As shown in FIG. 5, thetranscription server 20 includes a training server 104. The trainingserver 104 can use the corrections made by a user to “learn” so thatfuture incorrect translations are avoided. In some embodiments, sinceaudio data is received from one or more audio data sources 30representing multiple “speakers,” and since the email-client correctioninterface can be accessible over a network by multiple users, thetraining server 104 receives corrections from multiple users and,therefore, uses a voice independent model to learn from multiplespeakers or audio data sources.

In some embodiments, the system 10 transcribes audio files of apredetermined size (e.g., over 20 minutes in length) in pieces in orderto “pre-train” the translation server(s) 56. For example, thetranscription server 20 and/or the translation server(s) 56 can dividean audio file into segments (e.g., 1 to 5 minute segments). Thetranslation server(s) 56 can then transcribe one or more of the segmentsand the resulting text data can be made available to a user forcorrection (e.g., via the email-client correction interface). After thetranscribed segments are corrected and any corrections are applied tothe training server 104 in order to “teach” the system 10, thetranslation server(s) 56 transcribe the complete audio file. After thecomplete audio file is transcribed, the transcription of the completeaudio file is made available to a user for correction. Using the smallsegments of the audio file to pre-train the translation server(s) 56helps increase the accuracy of the transcription of the complete audiofile, which can save time and can prevent errors. In some embodiments,the complete audio file is transcribed before or in parallel with one ormore smaller segments of the same audio file. Once the complete audiofile is transcribed, a user can then immediately review and correct thetext for the complete audio file or can wait until the individualsegments are transcribed and corrected before correcting the text of thecomplete audio file. In addition, a user can request a re-transcriptionof the complete audio file after one or more individual segments aretranscribed and corrected. In some embodiments, if the complete audiofile is transcribed before or in parallel with smaller segments and thetranscription of the complete audio file has not been corrected by thetime the individual segments are transcribed and corrected, thetranscription server 20 and/or the translation server(s) 56automatically re-transcribes the complete audio file.

The voice independent model developed by the transcription server 20 canbe shared and used by multiple transcription servers 20. For example, insome embodiments, the voice independent model developed by atranscription server 20 can be copied to or shared with othertranscription servers 20. The model can be copied to other transcriptionservers 20 based on a predetermined schedule, anytime the model isupdated, on a manual basis, etc. In some embodiments, a leadtranscription server 20 collects audio and text data from othertranscription servers 20 (e.g., audio and text data which has not beenapplied to a training server) and transfers the data to a lead trainingserver 104. The lead transcription server 20 can collect the audio andtext data during periods of low network or processor usage. Theindividual training servers 104 of one or more transcription servers 20can also take turns processing batches of audio data and copying updatedvoice models to other transcription servers 20 (e.g., in a predeterminedsequence or schedule), which can ensure that each transcription server20 is using the most up-to-date voice model.

In some embodiments, individuals may be hired to correct transcribedaudio files (“correctors”), and the correctors may be paid on aper-line, per-word, per-file, time, or the like basis, and thetranscription server 20 can track performance data for the correctors.The performance data can include line counts, usage counts, word counts,etc. for individual correctors and/or groups of correctors. In someembodiments, the transcription server 20 enables a user (e.g., anadministrator) to access the performance data via an interface of thesystem 10 (e.g., an email-client correction interface or a website). Theuser can use the interface to input personal information associated withthe performance data, such as the correctors' names, employee numbers,etc. In some embodiments, the user can also use the interface toinitiate and/or specify payments to be made to the correctors. Theperformance data (and any related information provided by a user, suchas an administrator) can be stored in a database and/or can be exportedto an external accounting system, such as accounting systems andsolutions provided by Paychex, Inc. or QuickBooks® provided by Intuit,Inc. The transcription server 20 can send the performance data to anexternal accounting system via a direct connection or an indirectconnection, such as the Internet. The transcription server 20 can alsogenerate a file that can be stored to a portable data storage medium(e.g., a compact disk, a jump drive, etc.). The file can then beuploaded to an external accounting system from the portable data storagemedium. An external account system can use the performance data to paythe correctors, generate financial documents, etc.

In some embodiments, a user may not desire or need transcribed text datato be corrected. For example, a user may not want text data that issubstantially accurate to be corrected. In these situations, the system10 can allow a user to designate an accuracy threshold, and the system10 can apply the threshold to determine whether text data should becorrected. For example, if generated text data has a percentage or othermeasurement of accurate words (as determined by the transcription server20) that is equal to or greater than the accuracy threshold specified bythe user, the system 10 can allow the text data to skip the correctionprocess (and the associated training or learning process). The system 10can deliver any generated text data that skips the correction processdirectly to its destination (e.g., directly sent to a user via an emailmessage, directly stored to a database, etc.). In some embodiments, theaccuracy threshold can be set by a user using any described interface ofthe system 10. The threshold can be applied to all text data or only toparticular text data (e.g., only text data generated based on audio datareceived from a particular audio source, only text data that isassociated with a particular destination, etc.).

FIG. 6 illustrates an exemplary transcription, correction, and trainingmethod or process performed by the system 10. The transcription,correction, and training process of the system 10 can be a continualprocess by which files enter the system 10 and are moved through theseries of steps shown in FIG. 6. As shown in FIG. 6 (also with referenceto FIGS. 1-3), the transcription server 20 receives audio data 100 fromone or more data source email-clients 30. Next, the transcription server20 places the audio data 100 into one or more queues 54 (step 120). Oncea translation server or processor 56 is available, the audio data 100 istransmitted from a queue 54 to a translation server 56. The translationserver 56 transcribes the audio data to generate text data, and indexesthe audio data (step 122).

After the audio data is indexed and transcribed, the audio data and/orgenerated text data is made available to a user for review and/orcorrection via the email-client correction interface (step 124). If thetext data needs to be corrected (step 126), the user makes thecorrections and submits the corrections to the training server 104 ofthe transcription server 20 (step 128). The corrections are placed in atraining queue and are prepared for archiving (step 130). Periodically,the training server 104 obtains all the corrected files from thetraining queue and begins a training cycle for an independent voicemodel (step 132). In other embodiments, the training server 104 obtainssuch corrected files immediately, rather than periodically. The trainingserver 104 can be a server that is separate from the transcriptionserver 20, and can update the transcription server 20 and/or otherservers on a continuous or periodic basis. In other embodiments, thetraining server 104, transcription server 20, and any other serversassociated with the system 10 are defined by the same computer. Itshould be understood that, as used herein and in the appended claims,the terms “server,” “queue,” “module”, etc. are intended to encompasshardware and/or software adapted to perform a particular function.

Any portion or all of the transcription, correction, and trainingprocess performed by the system 10 can be performed by one or morepolling managers (e.g., associated with the transcription server 20, thetraining server 104, or other servers). In some embodiments, thetranscription server 20 and/or the training server 104 utilizes one ormore “flags” to indicate a stage of a file. By way of example only,these flags can include: (1) waiting for transcription; (2)transcription in progress; (3) waiting for correction; (4) correctioncompleted; (5) waiting for training; (6) training in progress; (7)retention; (8) move to history pending; and (9) history.

In some embodiments, the only action required by a user as a messagemoves through different stages of the system 10 is to indicate thatcorrection of the message has been completed. In other embodiments, aless automated system can exist, requiring more input from a user duringthe transcription, correction, and training process.

Another example of a method by which messages are processed in thesystem 10 is illustrated in FIG. 7. In this embodiment, a pollingmanager is used to control the timing of file processing in the system.In particular, at least a portion of the transcription, correction, andtraining process is moved along by alternating actions of a pollingmanager. In some embodiments, the polling manager runs on a relativelyshort time interval to move files from stage to stage within thetranscription, correction, and training process. Although not required,the polling manager can move multiple files in different stages to thenext stage at the same time.

With reference to the exemplary embodiment illustrated in FIG. 7, thepolling manager locates files to enter the transcription, correction,and training process. For example, the polling manager can check a listof FTP servers/locations for new files. New files identified by thepolling manger are downloaded (step 202) and added to the database (step204). When a file arrives, the polling manager flags the file “waitingfor transcription” (step 206). The polling manager then executes andmoves the file to a transcription queue (step 208), after which time thenext available server/processor transcribes the file (step 210) on afirst-in, first-out basis, unless a different priority is assigned. Oncethe file is assigned to a server/processor for transcription, thepolling manager flags the file “transcription in progress.” Whentranscription of the file is complete, the polling manager flags thefile “waiting for correction” (step 212), and the file is made availableto a user for correction (e.g., through the email-client correctioninterface). When a user is done correcting the file, the polling managerflags the file “correction completed” (step 214). The polling managerthen flags the file “waiting for training,” and moves the corrected fileinto a waiting to be trained queue (step 216). During the time in whichthe training process runs (step 218), the polling manager flags the file“training in progress.” After the training process, the polling managerflags the file “retention.” In some embodiments, a user-definedretention determines when and whether files are archived. During thetime in which a file is being archived (step 220), the polling managerflags the file “move to history pending.” When a file has been archived,the polling manager flags the file “history.”

The archival process allows files to move out of the system 10immediately or based at least in part upon set retention rules. Archivedor historical files allow the system 10 to keep current files availablequickly while older files can be encrypted, compressed, and stored.Archived files can also be returned to a user (step 222) in any manneras described above.

In some embodiments, the email-client correction interface shows thestage of one or more files in the transcription, correction, andtraining process. This process can be automated and database driven sothat all files are used to build and train the voice independent model.

It should be noted that a database-driven system 10 allows redundancywithin the system. Multiple servers can share the load of the processdescribed above. Also, multiple servers across different geographicregions can provide backup in the event of a natural disaster or otherproblem at one or more sites.

FIG. 8 illustrates a correction method according to an embodiment of theinvention. The correction process of FIG. 8 begins when audio data isreceived by the transcription server 20 and is transcribed (step 250).As described above with respect to FIGS. 1-2, the transcription server20 can receive audio data from one or more devices running email-clients30, such as a computer 30 a, a PDA 30 b, a blackberry device 30 c, amobile phone 30 d, etc.

The transcription server 20 can send the correction notification to auser who is assigned to the correction of transcribed audio dataassociated with a particular owner or destination. For example, as thetranscription server 20 transcribes voicemail messages for a particularmember of an organization, the transcription server 20 can send anotification to a secretary or assistant of the member. An administratorcan use an interface of the system 10 (e.g., the email-client interface60) to configure one or more recipients who are to receive thecorrection notifications for a particular destination (e.g., aparticular email account). An administrator can also specify settingsfor notifications, such as the type of notification to send (e.g.,email, text, audio, etc.), the addresses or identifiers of thenotification recipients (e.g., email addresses), the information to beincluded in the notifications, etc. For example, an administrator canestablish rules for sending correction notifications, such astranscriptions associated with audio data received by the transcriptionserver 20 from a particular audio data source should be corrected byparticular users. In addition, as described above, an administration canset one or more accuracy thresholds, which can dictate when transcribedaudio data skips the correction process.

FIG. 9 illustrates an email correction notification 254 according to anembodiment of the invention that is listed in an inbox 255 of an emailapplication. As shown in FIG. 9, the email correction notification 254is listed as an email message in the inbox 255 similar to other emailmessages 256 received from other sources. For example, the inbox 255 candisplay the sender of the email correction notification 254 (i.e., thetranscription server 20), an account or destination associated with theaudio data and generated text data (e.g., an account number), and anidentifier of the source of the audio data (e.g., the name of anindividual that sent the message). As shown in FIG. 9, the identifier ofthe source of the audio data can optionally include an address orlocation of the audio data source. In some embodiments (e.g., dependingon the email application used), the inbox 255 lists additionalinformation about the notification 254, such as the size of the emailcorrection notification 254, the time the notification 254 was sent,and/or the date that the notification 254 was sent.

To read the email correction notification 254, a user can select thenotification 254 (e.g., by clicking on, highlighting, etc.) in the inbox255. After the user selects the notification 254, the email applicationcan display the contents of the notification 254, as shown in FIG. 10.The contents of the email correction notification 254 can includesimilar information as displayed in the inbox 255. The contents of theemail correction notification 254 can also indicate the length of theaudio data transcribed by the transcription server 20 and the day, date,and/or time that the audio data was received by the transcription server20. To correct the transcription, the user can access the email-clientcorrection interface from their email-client. However, if the user doesnot have access to the email-client correction interface, a link 257 toa web interface is provided in the email correction notification.

Referring to FIGS. 11-14 illustrate the email-client correctioninterface 260 according to an embodiment of the invention. After a userreceives a correction notification 254, the user can access theemail-client correction interface 260 to review and correct thegenerated text data (if needed) (step 262). The email-client correctioninterface 260 is accessed from within the email-client. For example,when a user receives a correction notification indicating that the userhas messages that either have been corrected or are ready to becorrected, the user can access the email-client correction interface 260without launching a separate web browsing application. Additionally, auser can also reply directly to a correction notification that includestranscribed text, correct the transcribed text in the body of themessage, and send the corrected transcribed text back to thetranscription server 20. After sending the corrected transcribed textback to the transcription server 20, the voice model is updatedaccordingly.

As shown in FIG. 11, to access the email-client correction interface260, the user may first be prompted to enter credentials and/oridentifying information via a login screen 264 of the interface 260. Forexample, the login screen 264 can include one or more selectionmechanisms and/or input mechanisms 266 that enable a user to select orenter credentials and/or identifying information. As shown in FIG. 11,the login screen 264 can include input mechanisms 266 for entering ausername and a password. The input mechanisms 266 can be case sensitiveand/or can be limited to a predetermined set and/or number ofcharacters. For example, the input mechanisms 266 can be limited toapproximately 30 non-space characters. A user can enter his or herusername and password (e.g., as set by the user or an administrator) andcan select a log in selection mechanism 268. Alternatively, a user canselect a help selection mechanism 270 in order to access instructions,tips, help web pages, electronic manuals, etc. for the email-clientcorrection interface 260.

After the user enters his or her credentials and/or identifyinginformation, the email-client correction interface 260 verifies theentered information, and, if verified, the email-client correctioninterface 260 displays a main page 272, as shown in FIG. 12. The mainpage 272 includes a navigation area 274 and a view area 276. Thenavigation area 274 includes one or more selection mechanisms foraccessing standard functions of the email-client correction interface260. For example, as shown in FIG. 12, the navigation area 274 includesa help selection mechanism 278 and a log off selection mechanism 280. Asdescribed above, a user can select the help selection mechanism 278 inorder to access instructions, tips, help web pages, electronic manuals,etc. for the email-client correction interface 260. A user selects thelog off selection mechanism 280 in order to exit the email-clientcorrection interface 260. In some embodiments, if a user selects the logoff selection mechanism 280, the email-client correction interface 260returns the user to the login page 264.

As shown in FIG. 12, the navigation area 274 also includes an inboxselection mechanism 282, a my history selection mechanism 284, asettings selection mechanism 286, a help selection mechanism 288, and/ora log off selection mechanism 290. A user selects the inbox selectionmechanism 282 in order to view the main page 272. The user selects themy history selection mechanism 284 in order to access previouslycorrected transcriptions. In some embodiments, if a user selects the myhistory selection mechanism 284, the email-client correction interface260 displays a history page (not shown) similar to the main page 272that lists previously corrected transcriptions. Alternatively or inaddition to displaying the information displayed in the main page 272(e.g., file name, checked out by, checked in by, creation date,priority), the history page can display correction date(s) for eachtranscription.

A user can select the settings selection mechanism 286 in order toaccess one or more setting pages (not shown) of the email-clientcorrection interface 260. The setting pages can enable a user to changehis or her notification preferences, email-client correction interface260 preferences (e.g., change a username and/or password, set a timelimit for transcriptions displayed in a history page), etc. For example,as described above, a user can use the settings pages to specifydestination settings for audio data and/or generated text data,configure commands and keyboard shortcuts, specify accuracy thresholds,turn on or off particular features of the email-client correctioninterface 260 and/or the system 10, etc. In some embodiments, the numberand degree of settings configurable by a particular user via thesettings pages are based on the permissions of the user. Anadministrator can use the setting pages to specify global settings,group settings (e.g., associated with particular permissions), andindividual settings. In addition, an administrator can use a settingpage of the email-client correction interface 260 to specify users ofthe email-client correction interface 260 and can establish usernamesand passwords for users. Furthermore, as described above with respect toFIGS. 9 and 10, an administrator can use a setting page of theemail-client correction interface 260 to specify notificationparameters, such as who receives particular notifications, what type ofnotifications are sent, what information is included in thenotifications, etc.

As shown in FIG. 12, the view area 276 lists transcriptions (e.g.,associated with the logged-in user) that need attention (e.g.,correction). In some embodiments, the view area 276 includes one or morefilter selection mechanisms 292, that a user can use to filter and/orsort the listed transcriptions. For example, a user can use a filterselection mechanism 292 to filter and/or sort transcriptions by creationdate, priority, etc.

The view area 274 can also list additional information for eachtranscription. For example, as shown in FIG. 12, the view area 274 canlist a file name, a checked out by parameter, a checked out onparameter, a creation date, and a priority for each listedtranscription. The view area 274 can also include an edit selectionmechanism 294 and a complete selection mechanism 296 for eachtranscription.

Returning to FIG. 8, after a user accesses the email-client correctioninterface 260, the user can select a transcription to correct (step298). As shown in FIG. 12, to correct a particular transcription, theuser selects the edit selection mechanism 294 associated with thetranscription. When a user selects an edit selection mechanism 294, theemail-client correction interface 260 displays a correction page 300, anexample of which is shown in FIG. 13. The correction page 300 includesthe navigation area 274, as described above with respect to FIG. 12, anda correction view area 302. The correction view area 302 displays thetext data 303 generated by the transcription. A user can edit the textdata 303 by deleting text, inserting text, cutting text, copying text,etc. within the correction view area.

In some embodiments, the correction view area 302 also includes arecording control area 304. The recording control area 304 can includeone or more selection mechanisms for listening to or playing the audiodata associated with the text data 303 displayed in the correction viewarea 302. For example, as shown in FIG. 13, the recording control area304 can include a play selection mechanism 306, a stop selectionmechanism 308, and a pause selection mechanism 310. A user can selectthe play selection mechanism 306 to play the audio data from thebeginning and can select the stop selection mechanism 308 to stop theaudio data. Similarly, a user can select the pause selection mechanism310 to pause the audio data. In some embodiments, selecting the pauseselection mechanism 310 after pausing the audio data causes thecorrection interface 260 to continue playing the audio data (e.g., fromthe point at which the audio data was paused).

As shown in FIG. 13, the recording control area 304 can also include acontinue from cursor selection mechanism 312. A user can select thecontinue from cursor selection mechanism 312 in order to start playingthe audio data at a location corresponding to the position of the cursorwithin the text data 303. For example, if a user places a cursor withinthe text data 303 before the word “Once” and selects the continue fromcursor selection mechanism 312, the email-client correction interface260 plays the audio data starting from the word “Once.” In someembodiments, the recording control area 304 also includes a playbackcontrol selection mechanism 314 that a user can use to specify a numberof seconds to play before playing the audio data starting at the cursorposition. For example, as shown in FIG. 13, a user can specify 1 to 8seconds using the play control selection mechanism 314 (e.g., bydragging an indicator along the timeline or in another suitable manner).After setting the playback control selection mechanism 314, the user canselect the continue from cursor selection mechanism 312, which causesthe email-client correction interface 260 to play the audio datastarting at the cursor position minus the number of seconds specified bythe play control selection mechanism 314.

In some embodiments, the recording control area 304 also includes aspeed control mechanism (not shown) that allows a user to decrease andincrease the playback speed of audio data. For example, the recordingcontrol area 304 includes a speed control mechanism that includes one ormore selection mechanisms (e.g., buttons, timelines, etc.). A user canselect (e.g., click, drag, etc.) the selection mechanisms in order toincrease or decrease the playback of audio data by a particular speed.In some embodiments, the speed control mechanism can also include aselection mechanism that a user can select in order to play audio dataat normal speed.

In some embodiments, a user can hide the recording control area 304. Forexample, as shown in FIG. 13, the correction view area 302 can includeone or more selection mechanisms 315 (e.g., tabs) that enable a user tochoose whether to view the text data 303 only (e.g., by selecting a fulltext tab 315 a) or to view the text data 303 and the recording controlarea 304 (e.g., by selecting a listen/text tab 315 b).

The correction view area 302 can also include a save selection mechanism316. A user can select the save selection mechanism 316 in order to savethe current state of the corrected text data 303. A user can select thesave selection mechanism 316 at any time during the correction process.

The correction view area 302 can also include a table 318 that lists,among other things, the system's confidence in its transcriptionquality. For example, as shown in FIG. 13, the correction view area 302can list the total number of words in the text data 303, the number oflow-confidence words in the text data 303, the number ofmedium-confidence words in the text data 303, and/or the number ofhigh-confidence words in the text data. “Low” words can include wordsthat are least likely to be correct. “Medium” words can include wordsthat are moderately likely to be correct. “High” words can include wordsthat are very likely to be correct. In some embodiments, if the numberof low words in the text data 303 is close to the number of total wordsin the text data 303, it may be useful for the user to delete the textdata 303 and manually retype the text data 303 by listening to thecorresponding audio data. This situation may occur if the audio data wasreceived from an audio data source that the system 10 has not previouslyreceived data from or has not previously received significant data from.

Returning to FIG. 8, after a user selects a transcription to correct,the user corrects the transcription as necessary via the email-clientcorrection interface 260 (step 320) and submits or saves the correctedtranscription (step 322). As described above with respect to FIG. 13, tosubmit or save corrected text data 303, a user can select the saveselection mechanism 316 included in the correction page 300. In someembodiments, when a user selects the save selection mechanism 316, theemail-client correction interface 260 displays a save options page 330,as shown in FIG. 14. The save options page 330 can include thenavigation area 274, as described above with respect to FIGS. 12 and 13,and a save options view area 332. The save options view area 332 candisplay one or more selection mechanisms for saving the current state ofthe corrected text data 303. For example, as shown in FIG. 14, theoptions view area 332 can include a save recording selection mechanism334, a save and mark as complete selection mechanism 336, and a save,mark as complete and send to owner selection mechanism 338. A user canselect the save recording selection mechanism 334 in order to save thecurrent state of the text data 303 with any corrections made by theuser. The user is then returned to the main page 272. A user may selectthe save recording selection mechanism 334 if the user has not finishedmaking corrections to the text data 303 but wants to stop working on thecorrections at the current time. A user may also select the saverecording selection mechanism 334 if the user wants to periodically savecorrections when working on long transcriptions. In some embodiments,the save recording selection mechanism 334 is the default selection.

A user can select the save and mark as complete selection mechanism 336in order to save the corrections made by the user and move thetranscription to the user's history. Once the corrections are saved andmoved to the history folder, the user can access the correctedtranscription (e.g., via the history page of the email-client correctioninterface 260) but may not be able to edit the corrected transcription.

A user can select the save, mark as complete and send to owner selectionmechanism 338 in order to save the corrected transcription, move thecorrected transcription to the user's history folder, and send thecorrected transaction and/or the associated audio data to the owner ordestination of the audio data (e.g., the owner's email address). Asdescribed above, a destination for corrected transcriptions can includefiles and multiple devices running email clients. For example, theemail-client correction interface 260 can send a message notification tothe owner of the transcription that includes the corrected transcription(e.g., as text within the message or as an attached file). FIG. 15illustrates an email message notification 339 according to an embodimentof the invention. As shown in FIG. 15, the notification 339 includes thecorrected transcription.

Once a user selects a save option, the user can select an acceptselection mechanism 340 in order to accept the selected option or canselect a cancel selection mechanism 342 in order to cancel the selectedoption. In some embodiments, if a user selects the cancel selectionmechanism 342, the email-client correction interface 260 returns theuser to the correction page 300.

A user can also select a complete selection mechanism 296 included inthe main page 272 of the email-client correction interface 260 in orderto submit or save transcriptions. In some embodiments, if a user selectsa complete selection mechanism 296 included in the main page 272, theemail-client correction interface 260 displays the save options page 330as described above with respect to FIG. 14. In other embodiments, if auser selects a complete selection mechanism 296 included in the mainpage 272, the email-client correction interface 260 automatically savesany previous corrections made to the transcription associated with thecomplete selection mechanism 296, moves the corrected transcription tothe user's history folds, and sends the completed transcription and/orthe corresponding audio data to the owner or destination associated withthe transcription.

The embodiments described above and illustrated in the figures arepresented by way of example only and are not intended as a limitationupon the concepts and principles of the invention. As such, it will beappreciated by one having ordinary skill in the art that various changesin the elements and their configuration and arrangement are possiblewithout departing from the spirit and scope of the present invention.For example, in some embodiments the transcription server 20 utilizesmultiple threads to transcribe multiple files concurrently. This processcan use a single database or a cluster of databases holding temporaryinformation to assist in multiple thread transcription on the same ordifferent machines. Each system or device included in embodiments of thepresent invention can also be performed by one or more machines and/orone or more virtual machines.

Various features and advantages of the invention are set forth in thefollowing claims.

1. A method of requesting a transcription of audio data, the methodcomprising: displaying a send-for-transcription button within anemail-client interface on a computer-controlled display; andautomatically sending a selected email message and associated audio datato a transcription server as a request for a transcription of theassociated audio data when a user selects the send-for-transcriptionbutton.
 2. The method of claim 1, further comprising displaying a statusof the selected email message within the email-client interface, whereinthe status indicates at least one of whether the selected email messagehas been sent to the transcription server, whether transcribed textbased on the associated audio data has been received, and whethercorrected text data has been received associated with the transcribedtext.
 3. The method of claim 2, further comprising playing theassociated audio data within the email-client interface so that theaudio data is audible to a user.
 4. The method of claim 1, furthercomprising receiving the transcription of the associated audio data fromthe transcription server.
 5. The method of claim 4, further comprisingdisplaying the transcription of the associated audio data to a userwithin the email-client interface.
 6. The method of claim 5, furthercomprising receiving corrected text data associated with thetranscription of the associated audio data from a user within theemail-client interface.
 7. The method of claim 6, further comprisingsending the corrected text data to the transcription server.
 8. A systemfor requesting a transcription of audio data, the system comprising: atranscription server; an email-client interface displaying at least oneemail message associated with audio data to a user, displaying asend-for-transcription button to the user, receiving a selection of theat least one email message from the user, receiving a selection of thesend-for-transcription button from the user, and automatically sendingthe at least one email message and associated audio data to thetranscription server as a request for a transcription of the associatedaudio data in response to the user's selection of thesend-for-transcription button.
 9. The system of claim 8, wherein theemail-client interface displays a status associated with the at leastone email message, wherein the status includes at least one of whetherthe at least one email message has been sent to the transcriptionserver, whether transcribed text based on the associated audio data hasbeen received, and whether corrected text data has been receivedassociated with the transcribed text.
 10. The system of claim 8, whereinthe email-client interface plays the associated audio data so that theassociated audio data is audible to a user.
 11. The system of claim 8,wherein the transcription server generates the transcription of theassociated audio data based on a voice independent model.
 12. The systemof claim 8, wherein the transcription server identifies an accountassociated with the at least one email message based on at least one ofan email address and an internet protocol address associated with the atleast one email message.
 13. The system of claim 12, wherein thetranscription server obtains stored account settings associated with theidentified account, the account settings including at least one oftranscribed text delivery settings, transcription settings, andtranscription format settings.
 14. The system of claim 13, wherein thetranscription server generates the transcription of the associated audiodata based on the account settings.
 15. The system of claim 8, whereinthe transcription server generates the transcription of the associatedaudio data and sends the transcription of the associated audio data tothe email-client interface.
 16. The system of claim 15, wherein theemail-client interface displays the transcription of the associatedaudio data to a user, receives corrected text data associated with thetranscription of the associated audio data from the user, and sends thecorrected text data to the transcription server.
 17. The system of claim16, wherein the transcription server modifies a voice-independent modelbased on the corrected text data.
 18. A system for generating atranscription of audio data, the system comprising: a transcriptionserver configured to receive at least one email message and associatedaudio data from an email-client, identify an account based on the atleast one email message, and obtain stored account settings associatedwith the identified account; and a translation server configured togenerate a transcription of the associated audio data based on theaccount settings and a voice-independent model.
 19. The system of claim18, wherein the account settings include at least one of transcribedtext delivery settings, transcription settings, and transcription formatsettings.
 20. The system of claim 18, wherein the transcription serveridentifies an account based on at least one of an email address and aninternet protocol address associated with the at least one emailmessage.